167 results found.
Written
Corpus,
Language Type:
Multilingual
Languages:
Arabic Bulgarian Catalan Croatian Czech Danish Dutch English Estonian Filipino Finnish French German Greek Hebrew Hindi Hungarian Indonesian Italian Japanese Korean Latvian Lithuanian Malay Norwegian Persian Polish Portuguese Romanian Russian Serbian Simplified Chinese Slovak Slovenian Spanish Swedish Thai Traditional Chinese Turkish Ukrainian Vietnamese
Availability:
Freely Available
License:
CC-BY-SA
Size:
60 GByte Production Status:
Newly created-in progress
Use:
Language Modelling
-
Paper title:Wiki-40B: Multilingual Language Model Dataset
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Rami Al-Rfou | Wiki40B-LM | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Afrikaans Albanian Arabic Armenian Bangla Basque Bosnian Breton Bulgarian Catalan Croatian Czech Danish Dutch English Esperanto Estonian Filipino Finnish French Galician Georgian German Greek Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Kazakh Korean Latvian Lithuanian Macedonian Malay Malayalam Norwegian Persian Polish Portuguese Romanian Russian Serbian Sinhala Slovak Slovenian Spanish Swedish Tamil Telugu Thai Turkish Ukrainian Urdu Vietnamese pt_br ze_en ze_zh zh_cn zh_tw
Availability:
Freely Available
License:
<Not Specified>
Size:
22.10G tokens Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yo Joong Choe | OpenSubtitles2018 | /N |
Documentation:
Yes, on the website.
Written
Lexicon,
Language Type:
Monolingual
Languages:
Afrikaans Albanian Arabic Armenian Bangla Basque Bosnian Breton Bulgarian Catalan Croatian Czech Danish Dutch English Esperanto Estonian Filipino Finnish French Galician Georgian German Greek Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Kazakh Korean Latvian Lithuanian Macedonian Malay Malayalam Norwegian Persian Polish Portuguese Romanian Russian Serbian Sinhala Slovak Slovenian Spanish Swedish Tamil Telugu Thai Turkish Ukrainian Urdu Vietnamese pt_br ze_en ze_zh zh_cn zh_tw
Availability:
Freely Available
License:
CreativeCommons Attribution 4.0 International
Size:
41 GByte Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yo Joong Choe | word2word | /N |
Documentation:
Yes, on the website.
Speech
Corpus,
Language Type:
Bilingual
Languages:
French Nisvai
Availability:
From Owner
License:
Proprietary
Size:
31,552 tokens Production Status:
Newly created-in progress
Use:
Corpus Creation/Annotation
-
Paper title:The Nisvai Corpus of Oral Narrative Practices from Malekula (Vanuatu) and its Associated Language Resources
-
Paper track:Multimodality/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Jocelyn Aznar | The Nisvai Narrative Corpus | /N |
Documentation:
No documentation
Multimodal/Multimedia
Image Analyzer,
Language Type:
Multilingual
Languages:
Dutch English French German Modern Greek Portuguese
Availability:
Freely Available
License:
European Union Public License 1.2
Size:
200 MByte Production Status:
Newly created-finished
Use:
Knowledge Discovery/Representation
-
Paper title:Immersive Language Exploration with Object Recognition and Augmented Reality
-
Paper track:Multimodality/poster presentation with demo
-
Paper status:Accept Poster+Demo
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Benny Platte | ARTranslate Open Source XCode Project | /N |
Documentation:
https://github.com/benpla/ARTranslate/blob/master/README.md
Written
Corpus,
Language Type:
Multilingual
Languages:
English French Portuguese Spanish
Availability:
Freely Available
License:
Size:
300 OtherProduction Status:
Existing-used
Use:
Evaluation/Validation
-
Paper title:MEDLINE as a Parallel Corpus: a Survey to Gain Insight on French-, Spanish- and Portuguese-speaking Authors’ Abstract Writing Practice
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Aurélie Névéol | MEDLINE parallel corpus | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
Dutch English French Portuguese
Availability:
Freely Available
License:
Apache-2.0
Size:
31403 translation units OtherProduction Status:
Newly created-finished
Use:
Evaluation/Validation
-
Paper title:A Post-Editing Dataset in the Legal Domain: Do we Underestimate Neural Machine Translation Quality?
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Julia Ive | Post-Editing Dataset in the Legal Domain | /N |
Documentation:
None
Written
Lexicon,
Language Type:
Bilingual
Languages:
English French
Availability:
Freely Available
License:
Size:
492 entries Production Status:
Newly created-finished
Use:
Word Sense Disambiguation
-
Paper title:Dataset for Temporal Analysis of English-French Cognates
-
Paper track:Evaluation/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Antoine Doucet | List of English-French Cognates | /N |
Documentation:
None
Written
Corpus,
Language Type:
Bilingual
Languages:
English French
Availability:
Metadata freely available and full texts available only for the French higher education and research community
License:
Open Licence Etalab for metadata and publisher type ISTEX licence for full texts
Size:
1161 KByte Production Status:
Newly created-finished
Use:
Evaluation/Validation
-
Paper title:An Experiment in Annotating Animal Species Names from ISTEX Resources
-
Paper track:Evaluation/poster presentation
-
Paper status:Accept Poster+DemoSuggested
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Sabine Barreaux | Animalia 100 | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
French
Availability:
From Owner
License:
Creative Commons
Size:
52704 tokens Production Status:
Newly created-finished
Use:
text simplification, reading evaluation
-
Paper title:Alector: A Parallel Corpus of Simplified French Texts with Alignments of Misreadings by Poor and Dyslexic Readers
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Núria Gala | Alector Corpus | /N |
Documentation:
https://alectorsite.wordpress.com/




